19 research outputs found

    Substructural Analysis Using Evolutionary Computing Techniques

    Get PDF
    Substructural analysis (SSA) was one of the very first machine learning techniques to be applied to chemoinformatics in the area of virtual screening. For this method, given a set of compounds typically defined by their fragment occurrence data (such as 2D fingerprints). The SSA computes weights for each of the fragments which outlines its contribution to the activity (or inactivity) of compounds containing that fragment. The overall probability of activity for a compound is then computed by summing up or combining the weights for the fragments present in the compound. A variety of weighting schemes based on specific relationship-bound equations are available for this purpose. This thesis identifies uplift to the effectiveness of SSA, using two evolutionary computation methods based on genetic traits, particularly the genetic algorithm (GA) and genetic programming (GP). Building on previous studies, it was possible to analyse and compare ten published SSA weighting schemes based on a simulated virtual screening experiment. The analysis showed the most effective weighting scheme to be the R4 equation which was a part of document-based weighting schemes. A second experiment was carried out to investigate the application of GA-based weighting scheme for the SSA in comparison to an experiment using the R4 weighting scheme. The GA algorithm is simple in concept focusing purely on suitable weight generation and effective in operation. The findings show that the GA-based SSA is superior to the R4-based SSA, both in terms of active compound retrieval rate and predictive performance. A third experiment investigated the genetic application via a GP-based SSA. Rigorous experiment results showed that the GP was found to be superior to the existing SSA weighting schemes. In general, however, the GP-based SSA was found to be less effective than the GA-based SSA. A final experimented is described in this thesis which sought to explore the feasibility of data fusion on both the GA and GP. It is a method producing a final ranking list from multiple sets of ranking lists, based on several fusion rules. The results indicate that data fusion is a good method to boost GA-and GP-based SSA searching. The RKP rule was considered the most effective fusion rule

    Pedestrian Detection using Triple Laser Range Finders

    Get PDF
    Pedestrian detection is one of the important features in autonomous ground vehicle (AGV). It ensures the capability for safety navigation in urban environment. Therefore, the detection accuracy became a crucial part which leads to implementation using Laser Range Finder (LRF) for better data representation. In this study, an improved laser configuration and fusion technique is introduced by implementation of triple LRFs in two layers with Pedestrian Data Analysis (PDA) to recognize multiple pedestrians. The PDA integrates various features from feature extraction process for all clusters and fusion of multiple layers for better recognition. The experiments were conducted in various occlusion scenarios such as intersection, closed-pedestrian and combine scenarios. The analysis of the laser fusion and PDA for all scenarios showed an improvement of detection where the pedestrians were represented by various detection categories which solve occlusion issues when low numberof laser data were obtained

    Machine Learning Approach for Bottom 40 Percent Households (B40) Poverty Classification

    Get PDF
    Malaysia citizens are categorised into three different income groups which are the Top 20 Percent (T20), Middle 40 Percent (M40), and Bottom 40 Percent (B40). One of the focus areas in the Eleventh Malaysia Plan (11MP) is to elevate the B40 household group towards the middle-income society. Based on recent studies by the World Bank, Malaysia is expected to enter the high-income economy status no later than the year 2024. Thus, it is essential to clarify the B40 population through a predictive classification as a prerequisite towards developing a comprehensive action plan by the government. This paper is aimed at identifying the best machine learning models using Naive Bayes, Decision Tree and k-Nearest Neighbors algorithm for classifying the B40 population. Several data pre-processing task such as data cleaning, feature engineering, normalisation, feature selection: Correlation Attribute, Information Gain Attribute and Symmetrical Uncertainty Attribute and sampling methods using SMOTE has been conducted to the raw dataset to ensure the quality of the training data. Each classifier is then optimized using different tuning parameter with 10-Fold Cross Validation for achieving the optimal values before the performance of the three classifiers are compared to each other. For the experiments, a dataset from National Poverty Data Bank called eKasih obtained from the Society Wellbeing Department, Implementation Coordination Unit of Prime Minister's Department (ICU JPM), consisting of 99,546 households from 3 different states: Johor, Terengganu and Pahang are used to train each of the machine learning model. The experimental results using 10-Fold Cross-Validation method demonstrates that the overall performance of Decision Tree model outperformed the other models and the significance test specified the result is statistically significance

    Time Series Prediction of Bitcoin Cryptocurrency Price Based on Machine Learning Approach

    Get PDF
    Over the past few years, Bitcoin has attracted the attention of numerous parties, ranging from academic researchers to institutional investors. Bitcoin is the first and most widely used cryptocurrency to date. Due to the significant volatility of the Bitcoin price and the fact that its trading method does not require a third party, it has gained great popularity since its inception in 2009 among a wide range of individuals. Given the previous difficulties in predicting the price of cryptocurrencies, this project will be developing and implementing a time series approach-based solution prediction model using machine learning algorithms which include Support Vector Machine Regression (SVR), K-Nearest Neighbor Regression (KNN), Extreme Gradient Boosting (XGBoost), and Long Short-Term Memory (LSTM) to determine the trend of bitcoin price movement, and assessing the effectiveness of the machine learning models. The data that will be used is the close prices of Bitcoin from the year 2018 up to the year 2023. The performance of the machine learning models is evaluated by comparing the results of R-squared, mean absolute error (MAE), mean squared error (RMSE), and also through a visualization graph of the original close price and predicted close price of Bitcoin in a dashboard. Among the models compared, LSTM emerged as the most accurate, followed by SVR, while XGBoost and KNN exhibited comparatively lower performance

    Redefining Selection of Features and Classification Algorithms for Room Occupancy Detection

    Get PDF
    The exponential growth of todays technologies has resulted in the growth of high-throughput data with respect to both dimensionality and sample size. Therefore, efficient and effective supervision of these data becomes increasing challenging and machine learning techniques were developed with regards to knowledge discovery and recognizing patterns from these data. This paper presents machine learning tool for preprocessing tasks and a comparative study of different classification techniques in which a machine learning tasks have been employed in an experimental set up using a dataset archived from the UCI Machine Learning Repository website. The objective of this paper is to analyse the impact of refined feature selection on different classification algorithms to improve the prediction of classification accuracy for room occupancy. Subsets of the original features constructed by filter or information gain and wrapper techniques are compared in terms of the classification performance achieved with selected machine learning algorithms. Three feature selection algorithms are tested, specifically the Information Gain Attribute Evaluation (IGAE), Correlation Attribute Evaluation (CAE) and Wrapper Subset Evaluation (WSE) algorithms. Following a refined feature selection stage, three machine learning algorithms are then compared, consisting the Multi-Layer Perceptron (MLP), Logistic Model Trees (LMT) and Instance Based k (IBk). Based on the feature analysis, the WSE was found to be optimal in identifying relevant features. The application of feature selection is certainly intended to obtain a higher accuracy performance. The experimental results also demonstrate the effectiveness of Instance Based k compared to other ML classifiers in providing the highest performance rate of room occupancy prediction

    Malay version of the mhealth app usability Questionnaire (M-MAUQ): translation, adaptation, and validation study

    Get PDF
    Background: Mobile health (mHealth) apps play an important role in delivering education, providing advice on treatment, and monitoring patients’ health. Good usability of mHealth apps is essential to achieve the objectives of mHealth apps efficiently. To date, there are questionnaires available to assess the general system usability but not explicitly tailored to precisely assess the usability of mHealth apps. Hence, the mHealth App Usability Questionnaire (MAUQ) was developed with 4 versions according to the type of app (interactive or standalone) and according to the target user (patient or provider). Standalone MAUQ for patients comprises 3 subscales, which are ease of use, interface and satisfaction, and usefulness. Objective: This study aimed to translate and validate the English version of MAUQ (standalone for patients) into a Malay version of MAUQ (M-MAUQ) for mHealth app research and usage in future in Malaysia. Methods: Forward and backward translation and harmonization of M-MAUQ were conducted by Malay native speakers who also spoke English as their second language. The process began with a forward translation by 2 independent translators followed by harmonization to produce an initial translated version of M-MAUQ. Next, the forward translation was continued by another 2 translators who had never seen the original MAUQ. Lastly, harmonization was conducted among the committee members to resolve any ambiguity and inconsistency in the words and sentences of the items derived with the prefinal adapted questionnaire. Subsequently, content and face validations were performed with 10 experts and 10 target users, respectively. Modified kappa statistic was used to determine the interrater agreement among the raters. The reliability of the M-MAUQ was assessed by 51 healthy young adult mobile phone users. Participants needed to install the MyFitnessPal app and use it for 2 days for familiarization before completing the designated task and answer the M-MAUQ. The MyFitnessPal app was selected because it is one among the most popular installed mHealth apps globally available for iPhone and Android users and represents a standalone mHealth app. Results: The content validity index for the relevancy and clarity of M-MAUQ were determined to be 0.983 and 0.944, respectively, which indicated good relevancy and clarity. The face validity index for understandability was 0.961, which indicated that users understood the M-MAUQ. The kappa statistic for every item in M-MAUQ indicated excellent agreement between the raters (

    D2D-V2X-SDN: Taxonomy and Architecture towards 5G Mobile Communication System

    Get PDF
    In the era of information society and 5G networks, cars are extremely important mobile information carriers. In order to meet the needs of multi-scenario business requirements such as vehicle assisted driving and in-vehicle entertainment, cars need to interact with the outside world. This interconnection and data transmission process is usually called vehicular communication (V2X, Vehicle-to-Everything). Device-to-device (D2D) communication not only has partial nature of communication, but also alleviate the current problem of spectrum scarcity of resources. The application of D2D communication in V2X can meet the requirements of high reliability and low latency, but resource reuse also brings interference. Software-defined networking (SDN) provides an optimal solution for interoperability and flexibility between the V2X and D2D communication. This paper reviews the integration of D2D and V2X communication from the perspective of SDN. The state-of-the-art and architectures of D2D-V2X were discussed. The similarity, characteristics, routing control, location management, patch scheduling and recovery is described. The integrated architecture reviewed in this paper can solve the problems of routing management, interference management and mobile management. It also overcome the disconnection problem between the D2D-V2X in terms of SDN and provides some effective solutions.- Qatar National Research Fund (QNRF) - [UREP27-020-1-003]. - Ministry of Higher Education, Malaysia (MOHE) - [FRGS/1/2018/ICT02/UKM/02/6]. - National Research Foundation of Korea (NRF) - [2019R1C1C1007277]. - Taif University (TU)- [TURSP-2020/260]

    iDietScoreTM: Meal recommender system for athletes and active individuals

    Get PDF
    Individualized meal planning is a nutrition counseling strategy that focuses on improving food behavior changes. In the sports setting, the number of experts who are sports dietitians or nutritionists (SD/SN) is small in number, and yet the demand for creating meal planning for a vast number of athletes often cannot be met. Although some food recommender system had been proposed to provide healthy menu planning for the general population, no similar solution focused on the athlete's needs. In this study, the iDietScoreTM architecture was proposed to give athletes and active individuals virtual individualized meal planning based on their profile, includes energy and macronutrients requirement, sports category, age group, training cycles, training time and individual food preferences. Knowledge acquisition on the expert domain (the SN) was conducted prior to the system design through a semistructured interview to understand meal planning activities' workflow. The architecture comprises: (1) iDietScoreTM web for SN/SD, (2) mobile application for athletes and active individuals and (3) expert system. SN/SD used the iDietScoreTM web to develop a meal plan and initiate the compilation meal plan database for further use in the expert system. The user used iDietScoreTM mobile app to receive the virtual individualized meal plan. An inference-based expert system was applied in the current study to generate the meal plan recommendation and meal reconstruction for the user. Further research is necessary to evaluate the prototype

    Ensemble learning for multidimensional poverty classification

    Get PDF
    The poverty rate in Malaysia is determined through financial or income indices and measurements. As such, periodic measurements are conducted through Household Expenditure and Income Survey (HEIS) twice every five years, and subsequently used to generate a Poverty Line Income (PLI) to determine poverty levels through statistical methods. Such uni-dimensional measurement however is unable to portray the overall deprivation conditions, especially based on the experience of the urban population. In addition, the United Nation Development Programme (UNDP) has introduced a set of multi-dimensional poverty measurements but is yet to be applied in the case of Malaysia. In view of this, a potential use of Machine Learning (ML) approaches that can produce new poverty measurement methods is therefore of interest, which must be triggered by the existence of a rich database collection on poverty, such as the eKasih database maintained by the Malaysian Government. The goal of this study was to determine whether ensemble learning method (random forest) can classify poverty and hence produce multidimensional poverty indicator compared to based learner method using eKasih dataset. CRoss Industry Standard Process for Data Mining (CRISP-DM) methods was used to ensure data mining and ML processes were conducted properly. Beside Random Forest, we also examined decision tree and general linear methods to benchmark their performance and determine the method with the highest accuracy. Fifteen variables were then rank using varImp method to search for important variables. Analysis of this study showed that Per Capita Income, State, Ethnic, Strata, Religion, Occupation and Education were found to be the most important variables in the classification of poverty at a rate of 99% accuracy confidence using Random Forest algorithm
    corecore